{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/pytorch/pytorch_context_manager.ipynb)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Automatic Inference Context Management by `get_context`"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use ``InferenceOptimizer.get_context(model=...)`` API to enable automatic context management for PyTorch inference. With only one line of code change, BigDL-Nano will automatically provide suitable context management for each accelerated model optimized by ``InferenceOptimizer.trace``/``quantize``/``optimize``, it usually contains part of or all of following four types of context manager:\n",
    "\n",
    "1. ``torch.inference_mode(True)`` to disable gradients, which will be used for all models. For the case when ``torch <= 1.12``, ``torch.no_grad()`` will be used for PyTorch mixed precision inference as a replacement of ``torch.inference_mode(True)``\n",
    "   \n",
    "2. ``torch.cpu.amp.autocast(dtype=torch.bfloat16)`` to run in mixed precision, which will be provided for bf16 related model\n",
    "   \n",
    "3. ``torch.set_num_threads()`` to control thread number, which will be used only if you specify ``thread_num`` when applying ``InferenceOptimizer.trace``/``quantize``/``optimize``\n",
    "\n",
    "4. ``torch.jit.enable_onednn_fusion(True)`` to support ONEDNN fusion for jit when using jit as accelerator"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "To do inference using Bigdl-nano InferenceOptimizer, you need to install BigDL-Nano for PyTorch inference first. We recommend you to use [Miniconda](https://docs.conda.io/en/latest/miniconda.html) to prepare the environment and install the following packages in a conda environment.\n",
    "\n",
    "You can create a conda environment by executing:\n",
    "\n",
    "```\n",
    "# \"nano\" is conda environment name, you can use any name you like.\n",
    "conda create -n nano python=3.7 setuptools=58.0.4\n",
    "conda activate nano\n",
    "```\n",
    "> 📝 **Note**\n",
    ">\n",
    "> During your installation, there may be some warnings or errors about version, just ignore them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "# Necessary packages for inference accelaration\n",
    "!pip install --pre --upgrade bigdl-nano[pytorch,inference]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we take a pretrained ResNet18 model for example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from torchvision.models import resnet18\n",
    "\n",
    "model = resnet18(pretrained=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## InferenceOptimizer.trace\n",
    "\n",
    "For model accelerated by ``InferenceOptimizer.trace``, usage now looks like below codes, here we just take `ipex` for example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bigdl.nano.pytorch import InferenceOptimizer\n",
    "ipex_model = InferenceOptimizer.trace(model,\n",
    "                                      use_ipex=True,\n",
    "                                      thread_num=4)\n",
    "input_sample = torch.rand(1, 3, 224, 224)\n",
    "\n",
    "with InferenceOptimizer.get_context(ipex_model):\n",
    "    output = ipex_model(input_sample)\n",
    "    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## InferenceOptimizer.quantize\n",
    "\n",
    "For model accelerated by ``InferenceOptimizer.quantize``, usage now looks like below codes, here we just take ``bf16 + channels_last`` for example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bigdl.nano.pytorch import InferenceOptimizer\n",
    "bf16_model = InferenceOptimizer.quantize(model,\n",
    "                                         precision='bf16',\n",
    "                                         channels_last=True,\n",
    "                                         thread_num=4)\n",
    "input_sample = torch.rand(1, 3, 224, 224)\n",
    "\n",
    "with InferenceOptimizer.get_context(bf16_model):\n",
    "    output = bf16_model(input_sample)\n",
    "    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )\n",
    "    assert output.dtype == torch.bfloat16  # this line just to let you know Nano has provided autocast context manager automatically : )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## InferenceOptimizer.optimize\n",
    "\n",
    "By calling ``optimize()``, you will get bunchs of accelerated models at the same time, then you can obtain the model you want by ``InferenceOptimizer.get_model`` or ``InferenceOptimizer.get_best_model``. Usage looks like below codes, here we just take `openvino` for example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "from pathlib import Path\n",
    "from torchvision import transforms\n",
    "from torchvision.datasets import ImageFolder\n",
    "from torchvision.datasets.utils import download_and_extract_archive\n",
    "from torch.utils.data import Subset, DataLoader\n",
    "\n",
    "def prepare_model_and_dataset(model_ft, val_size):\n",
    "    DATA_URL = \"https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip\"\n",
    "\n",
    "    train_transform = transforms.Compose([\n",
    "        transforms.Resize((224, 224)),\n",
    "        transforms.RandomHorizontalFlip(),\n",
    "        transforms.ToTensor(),\n",
    "        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])\n",
    "    ])\n",
    "\n",
    "    val_transform = transforms.Compose([\n",
    "        transforms.Resize((224, 224)),\n",
    "        transforms.ToTensor(),\n",
    "        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])\n",
    "    ])\n",
    "\n",
    "    if not Path(\"data\").exists():\n",
    "        # download dataset\n",
    "        download_and_extract_archive(url=DATA_URL, download_root=\"data\", remove_finished=True)\n",
    "\n",
    "    data_path = Path(\"data/cats_and_dogs_filtered\")\n",
    "    train_dataset = ImageFolder(data_path.joinpath(\"train\"), transform=train_transform)\n",
    "    val_dataset = ImageFolder(data_path.joinpath(\"validation\"), transform=val_transform)\n",
    "\n",
    "    indices = torch.randperm(len(val_dataset))\n",
    "    val_dataset = Subset(val_dataset, indices=indices[:val_size])\n",
    "\n",
    "    train_dataloader = DataLoader(dataset=train_dataset, batch_size=8, shuffle=True)\n",
    "    val_dataloader = DataLoader(dataset=val_dataset, batch_size=8, shuffle=False)\n",
    "\n",
    "    return train_dataset, val_dataset\n",
    "\n",
    "train_dataset, val_dataset = prepare_model_and_dataset(model, val_size=500)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# To obtain the latency of single sample, set batch_size=1\n",
    "train_dataloader = DataLoader(train_dataset, batch_size=1)\n",
    "val_dataloader = DataLoader(val_dataset)\n",
    "\n",
    "from bigdl.nano.pytorch import InferenceOptimizer\n",
    "optimizer = InferenceOptimizer()\n",
    "optimizer.optimize(model=model,\n",
    "                   training_data=train_dataloader,\n",
    "                   thread_num=4,\n",
    "                   latency_sample_num=30)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "openvino_model = optimizer.get_model(\"openvino_fp32\")\n",
    "input_sample = torch.rand(1, 3, 224, 224)\n",
    "\n",
    "with InferenceOptimizer.get_context(openvino_model):\n",
    "    output = openvino_model(input_sample)\n",
    "    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "accelerated_model, option = optimizer.get_best_model()\n",
    "input_sample = torch.rand(1, 3, 224, 224)\n",
    "\n",
    "with InferenceOptimizer.get_context(accelerated_model):\n",
    "    output = accelerated_model(input_sample)\n",
    "    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Advanced Usage: Multiple Models\n",
    "\n",
    "``InferenceOptimizer.get_context(model=...)`` can be used for muitiple models. If you have a model pipeline, you can also get a common context manager by passing multiple models to `get_context`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 📝 **Note**\n",
    "> \n",
    ">Here are some rules that how we solve conflict between multiple context managers:\n",
    ">\n",
    "> 1. If two context managers have difference precision (bf16 and non bf16), we will return AutocastContextManager()\n",
    ">\n",
    "> 2. If only one context manager have thread_num, we will set thread_num to that value\n",
    ">\n",
    "> 3. If two context managers have different thread_num, we will set thread_num to the larger one"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is a simple example just to explain the usage for pipeline:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch import nn\n",
    "\n",
    "class Classifier(nn.Module):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        self.linear = nn.Linear(1000, 1)\n",
    "    \n",
    "    def forward(self, x):\n",
    "        return self.linear(x)\n",
    "\n",
    "classifer = Classifier()\n",
    "\n",
    "with InferenceOptimizer.get_context(ipex_model, classifer):\n",
    "    # a pipeline consists of backbone and classifier\n",
    "    x = ipex_model(input_sample)\n",
    "    output = classifer(x) \n",
    "    assert torch.get_num_threads() == 4  # this line just to let you know Nano has provided thread control automatically : )"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 📚 **Related Readings**\n",
    "> \n",
    "> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/install.html)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "nano-pytorch",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.15 (default, Nov 24 2022, 21:12:53) \n[GCC 11.2.0]"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "8a5edab282632443219e051e4ade2d1d5bbc671c781051bf1437897cbdfea0f1"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
